53 research outputs found
A generic methodology for the statistically uniform & comparable evaluation of Automated Trading Platform components
Although machine learning approaches have been widely used in the field of
finance, to very successful degrees, these approaches remain bespoke to
specific investigations and opaque in terms of explainability, comparability,
and reproducibility. The primary objective of this research was to shed light
upon this field by providing a generic methodology that was
investigation-agnostic and interpretable to a financial markets practitioner,
thus enhancing their efficiency, reducing barriers to entry, and increasing the
reproducibility of experiments. The proposed methodology is showcased on two
automated trading platform components. Namely, price levels, a well-known
trading pattern, and a novel 2-step feature extraction method. The methodology
relies on hypothesis testing, which is widely applied in other social and
scientific disciplines to effectively evaluate the concrete results beyond
simple classification accuracy. The main hypothesis was formulated to evaluate
whether the selected trading pattern is suitable for use in the machine
learning setting. Across the experiments we found that the use of the
considered trading pattern in the machine learning setting is only partially
supported by statistics, resulting in insignificant effect sizes (Rebound 7 -
, Rebound 11 , and rebound 15 - ),
but allowed the rejection of the null hypothesis. We showcased the generic
methodology on a US futures market instrument and provided evidence that with
this methodology we could easily obtain informative metrics beyond the more
traditional performance and profitability metrics. This work is one of the
first in applying this rigorous statistically-backed approach to the field of
financial markets and we hope this may be a springboard for more research.Comment: Associated processing files are available at:
https://doi.org/10.5281/zenodo.403685
A methodology for the quantitative evaluation of attacks and mitigations in IoT systems
PhD ThesisAs we move towards a more distributed and unsupervised internet, namely through the
Internet of Things (IoT), the avenues of attack multiply. To compound these issues, whilst
attacks are developing, the current security of devices is much lower than for traditional
systems.
In this thesis I propose a new methodology for white box behaviour intrusion detection
in constrained systems. I leverage the characteristics of these types of systems, namely their:
heterogeneity, distributed nature, and constrained capabilities; to devise a pipeline, that given
a specification of a IoT scenario can generate an actionable intrusion detection system to
protect it.
I identify key IoT scenarios for which more traditional black box approaches would
not suffice, and devise means to bypass these limitations. The contributions include; 1) A
survey of intrusion detection for IoT; 2) A modelling technique to observe interactions in IoT
deployments; 3) A modelling approach that focuses on the observation of specific attacks
on possible configurations of IoT devices; Combining these components: a specification
of the system as per contribution 1 and a attack specification as per contribution 2, we can
deploy a bespoke behaviour based IDS for the specified system. This one of a kind approach
allows for the quick and efficient generation of attack detection from the onset, positioning
this approach as particularly suitable to dynamic and constrained IoT environments
Escaping mediocrity: how two-layer networks learn hard single-index models with SGD
This study explores the sample complexity for two-layer neural networks to
learn a single-index target function under Stochastic Gradient Descent (SGD),
focusing on the challenging regime where many flat directions are present at
initialization. It is well-established that in this scenario
samples are typically needed. However, we provide precise results concerning
the pre-factors in high-dimensional contexts and for varying widths. Notably,
our findings suggest that overparameterization can only enhance convergence by
a constant factor within this problem class. These insights are grounded in the
reduction of SGD dynamics to a stochastic process in lower dimensions, where
escaping mediocrity equates to calculating an exit time. Yet, we demonstrate
that a deterministic approximation of this process adequately represents the
escape time, implying that the role of stochasticity may be minimal in this
scenario
A VST and VISTA study of globular clusters in NGC253
Aims. We analyze the properties of the sources in the NGC253 to define an up
to date catalog of GC candidates in the galaxy. Methods. Our analysis is based
on the science verification data of two ESO survey telescopes, VST and VISTA.
Using ugri photometry from VST and JKs from VISTA, GC candidates were selected
using the morpho-photometric and color properties of spectroscopically
confirmed GCs available in the literature. The strength of the results was
verified against available archival HST/ACS data from the GHOSTS survey.
Results. The adopted GC selection leads to the definition of a sample of ~350
GC candidates. At visual inspection, we find that 82 objects match all the
requirements for selecting GC candidates and 155 are flagged as uncertain GC
candidate; 110 are unlikely GCs, most likely background galaxies. Furthermore,
our analysis shows that four of the previously spectroscopically confirmed GCs,
i.e., ~20% of the total spectroscopic sample, are more likely either background
galaxies or high-velocity Milky Way stars. The radial density profile of the
selected best candidates shows the typically observed r1/4-law radial profile.
The analysis of the color distributions reveals only marginal evidence of the
presence of color bimodality, which is normally observed in galaxies of similar
luminosity. The GC luminosity function does not show the typical symmetry,
mainly because of the lack of bright GCs. Part of the bright GCs missing might
be at very large galactocentric distances or along the line of sight of the
galaxy dusty disk. Conclusions. Using ugriJKs photometry we purged the list of
GCs with spectroscopic membership and photometric GC candidates in NGC 253. Our
results show that the use of either spectroscopic or photometric data only does
not generally ensure a contaminant-free sample and a combination of both
spectroscopy and photometry is preferred.Comment: 24 pages, 15 figures, Accepted by Astronomy and Astrophysic
Modelling Load-Changing Attacks in Cyber-Physical Systems
Cyber-Physical Systems (CPS) are present in many settings addressing a myriad of purposes. Examples are Internet-of-Things (IoT) or sensing software embedded in appliances or even specialised meters that measure and respond to electricity demands in smart grids. Due to their pervasive nature, they are usually chosen as recipients for larger scope cyber-security attacks. Those promote system-wide disruptions and are directed towards one key aspect such as confidentiality, integrity, availability or a combination of those characteristics. Our paper focuses on a particular and distressing attack where coordinated malware infected IoT units are maliciously employed to synchronously turn on or off high-wattage appliances, affecting the grid's primary control management. Our model could be extended to larger (smart) grids, Active Buildings as well as similar infrastructures. Our approach models Coordinated Load-Changing Attacks (CLCA) also referred as GridLock or BlackIoT, against a theoretical power grid, containing various types of power plants. It employs Continuous-Time Markov Chains where elements such as Power Plants and Botnets are modelled under normal or attack situations to evaluate the effect of CLCA in power reliant infrastructures. We showcase our modelling approach in the scenario of a power supplier (e.g. power plant) being targeted by a botnet. We demonstrate how our modelling approach can quantify the impact of a botnet attack and be abstracted for any CPS system involving power load management in a smart grid. Our results show that by prioritising the type of power-plants, the impact of the attack may change: in particular, we find the most impacting attack times and show how different strategies impact their success. We also find the best power generator to use depending on the current demand and strength of attack
ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification
Verification of machine learning models used in Natural Language Processing
(NLP) is known to be a hard problem. In particular, many known neural network
verification methods that work for computer vision and other numeric datasets
do not work for NLP. Here, we study technical reasons that underlie this
problem. Based on this analysis, we propose practical methods and heuristics
for preparing NLP datasets and models in a way that renders them amenable to
known verification methods based on abstract interpretation. We implement these
methods as a Python library called ANTONIO that links to the neural network
verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP
dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP
applications. We hope that, thanks to its general applicability, this work will
open novel possibilities for including NLP verification problems into neural
network verification competitions, and will popularise NLP problems within this
community.Comment: To appear in proceedings of 6th Workshop on Formal Methods for
ML-Enabled Autonomous Systems (Affiliated with CAV 2023
- …